Geography, taxonomy, and ecological guild: Factors impacting freshwater macroinvertebrate gut microbiomes

Abstract Despite their diversity, global distribution, and apparent effects on host biology, the rules of life that govern variation in microbiomes among host species remain unclear, particularly in freshwater organisms. In this study, we sought to assess whether geographic location, taxonomy (order, family, and genus), or functional feeding group (FFG) designations would best explain differences in the gut microbiome composition among macroinvertebrates sampled across 10 National Ecological Observatory Network's (NEON) freshwater stream sites in the United States. Subsequently, we compared the beta diversity of microbiomes among locations, taxonomy (order, family, and genus), and FFGs in a single statistical model to account for variation within the source microbial community and the types of macroinvertebrates sampled across locations. We determined significant differences in community composition among macroinvertebrate orders, families, genera, and FFGs. Differences in microbiome compositions were underscored by different bacterial ASVs that were differentially abundant among variables (four bacterial ASVs across the 10 NEON sites, 43 ASVs among the macroinvertebrate orders, and 18 bacterial ASVs differing among the five FFGs). Analyses of variations in microbiome composition using the Bray–Curtis distance matric revealed FFGs as the dominant source of variation (mean standard deviation of 0.8), followed by stream site (mean standard deviation of 0.5), and finally family and genus (mean standard deviation of 0.3 each). Our findings revealed a principal role for FFG classification in insect gut microbiome beta diversity with additional roles for geographic distribution and taxonomy.


| INTRODUC TI ON
A sizeable portion of the Earth's diversity occurs in the microbiomes, the micro-organisms that live within host species (Hug et al., 2016;Louca et al., 2019). In studies of terrestrial insect gut microbiomes, stochastic and deterministic (at opposite ends of a spectrum) processes are essential to gut microbial assembly (Hanson et al., 2012;Jizhong & Daliang, 2017). Stochastic processes (or ecological processes) such as priority effects, dispersal limitation, and ecological drift have been determined to be responsible for variations in microbial community composition (or ecological processes [Jizhong & Daliang, 2017]). Similarly, various deterministic factors (nonrandom, niche-based mechanistic processes) also account for considerable variations in microbial community composition (free-living and host-associated). Deterministic processes mainly involve variation in environmental parameters (e.g., pH, salinity, DO, etc.), local habitat conditions, nutrient availability, and ultimately, co-evolution between host and associated gut microbiomes (phylosymbiosis) (Jizhong & Daliang, 2017). These insights have generated various conceptual frameworks to assess the dynamics governing community assembly (Vellend, 2010) and by extension, gut microbial community assembly in insect hosts (Brown et al., 2020). The rules of life that govern variation in microbiomes among host species remain unclear, particularly in freshwater organisms.
In studies of freshwater invertebrate gut microbiomes, there are conflicting results regarding the importance of habitat, taxonomy (family and genus), and functional feeding group (FFG) categorization in shaping gut microbial assemblages of freshwater macroinvertebrates. One study determined the taxonomy to be a more relevant determinant of gut microbiome composition of freshwater macroinvertebrates than local habitat, stream conditions, or FFG (Kroetsch et al., 2020). In contrast, the vast majority of the very limited studies have determined FFGs (regardless of taxonomic affiliation of samples invertebrates) to be the principal variable explaining differences in microbiome composition across freshwater macroinvertebrates (Ayayee et al., 2018;Kaufman et al., 2000;Pechal & Benbow, 2016;Receveur et al., 2020). Interestingly, location has not been determined to be a significant determinant of freshwater macroinvertebrate gut microbiome composition in neither a study of samples collected from two streams in the same region (Ayayee et al., 2018) nor samples collected from multiple sites along the reach of one major river (Kroetsch et al., 2020).
A significant limitation of these prior studies of freshwater macroinvertebrate gut microbiomes has been the lack of systematic sampling of freshwater ecosystems at a broad scale. In addition, most of these studies only sampled multiple sites within single streams or limited streams (1¬3) within one geographic region. The novel aspect of this study is that it is the first to sample multiple streams at a large geographical scale. We assessed the gut microbiomes of insects collected from 10 National Ecological Observatory Network's (NEON) freshwater stream sites. Subsequently, in a single statistical model, we compared the beta diversity of microbiomes among locations, taxonomy (order, family, and genus), and FFGs. This allowed us to account for variation within the source microbial community and the types of macroinvertebrates sampled across locations. Microbial community compositions are expected to vary due to the underlying geology, land usage, and stream conditions of these NEON sites (Atashgahi et al., 2015;Drury et al., 2013;Fang et al., 2017;Hosen et al., 2017;Kaushal et al., 2021;Medeiros et al., 2016;Wakelin et al., 2008). Subsequently, microbial community compositions are also expected to vary among FFGs, since they are known to have differences in gut physiology (Anderson & Cargill, 1987;Austin & Baker, 1988;Cummins, 1979;Martin et al., 1980;Martin, Martin, et al., 1981;Tierno de Figueroa et al., 2011), and consequently, associated gut microbiomes (Ayayee et al., 2018;Pechal & Benbow, 2016;Receveur et al., 2020) and functions (Stief, 2013;Stief et al., 2009).
In this study, we sought to assess whether geographic location, taxonomy (order, family, and genus), or FFG designations would best explain differences in gut microbiome composition among sampled macroinvertebrates across 10 streams in North America. We hypothesized that differences in gut microbial community composition (β-diversity) among sampled macroinvertebrates would be a function of deterministic factors (reflecting FFG designations) rather than stochastic factors (geographical location or taxonomical identification). This assumes that different macroinvertebrate taxa from different sites will have comparable gut physiologies, which will select for taxonomically comparable microbes, thus resulting in similar gut microbiomes.

| Sample collection
Aquatic insect samples were obtained from collections acquired by the National Ecological Observatory Network (NEON) sites in the United States in 2020 (https://www.neons cience.org/field -sites/ about -field -sites) (NEON, 2022). We obtained macroinvertebrate samples collected from 10 NEON-managed field sites across North America that ranged across 11 degrees of latitude with mean annual stream temperatures from 4.4 (BLDE) to 16°C (BLUE) (Figure 1).
Samples were collected from each of the 10 sites by NEON teams using conventional Surber samplers and D-frame nets (Parker, 2022).
At eight of the 10 sites, samples were collected in the fall (September, October, or November). At the remaining two sites (LEWI and MCDI), samples were collected in spring (March or June). Due to permit restrictions, we limited sample selection from NEON collections to five individuals per taxon per site. According to NEON protocols, macroinvertebrate samples were identified and verified to both species and family taxonomic levels (NEON, 2022;Parker, 2022) using published field guides and keys (Cummins, 2021;Merritt et al., 2008). The insect samples were stored in 1.5 ml tubes in 80% ethanol and stored at −20°C until DNA extraction. Upon receipt, insect samples were categorized into functional feeding groups using a combination of published (Cummins, 2021;Merritt et al., 2008) and online resources (https://www.macro inver tebra tes.org/).

| Sample processing, DNA extraction, and Illumina sequencing
Before DNA extraction, we surface-sterilized insect samples by washing them in a 1% detergent solution (Micro-90, Capitol Scientific, Inc) for 1 min, followed by two 1-min rinses in deionized water (DI) water. Additionally, we used the diluted detergent to surface sterilize the outside of the insect samples prior to any tissue homogenization for DNA extraction. Finally, the DNA quantity and quality both in-house and at the sequencing center indicated that using the diluted detergent did not affect the stability and structure of the DNA obtained from the insect samples it was used to sterilize. A previous study found no effect of storage or surface sterilization methods on gut bacterial community assessments (Hammer et al., 2015). If samples were large enough, then we dissected the entire dietary system; if not, we used the whole insect. Briefly, dissections were performed in a designated clean area. The dissecting tools were maintained in a 10% bleach solution during the entire dissection process and were rinsed in DI water every time before usage. The entire process of surface sterilizing, dissection, and DNA extraction was performed using gloves.
Next, we performed DNA extraction using the DNeasy Blood & Tissue Kit (Qiagen, Germantown, MD, USA) with modifications to the manufacturer's directions. We verified the presence of microbial 16 S rRNA marker gene in all extracted DNA samples via PCR using the universal 27F and 1492R bacterial primer pair (Frank et al., 2008). Samples were submitted for high-throughput pairedend MiSeq library preparation and sequencing at the University of Nebraska Medical Center Genomics Core. Briefly, a limited cycle PCR reaction was performed on each sample to create a single amplicon, including the V4 (515-F) and V5 (907-R) variable region (Keskitalo et al., 2017). The resulting libraries were validated using the Agilent BioAnalyzer 2100 DNA 1000 chip (Agilent), and DNA was quantified using Qubit 3.0 (Qubit™, Thermofisher). A pool of libraries was loaded into the Illumina MiSeq at 10 pM. The pool was spiked with 25% PhiX (a bacteriophage) at 10 pM for MiSeq run quality as an internal control (Mukherjee et al., 2015) to generate 300 bp paired ends with the 600-cycle kit (version 3). The raw reads were deposited into the Sequence Read Archive database (Accession number: PRJNA825559).

| Microbiome data processing and statistical analyses
Acquired fastq primer-trimmed MiSeq paired-end reads from the sequencing center were processed using DADA2 (Callahan et al., 2016).
Across both forward and reverse reads, filtering excluded reads with more than two expected erroneous base calls, any reads identified as part of the PhiX bacteriophage genome for quality control, and reads less than 175 base pairs. Forward reads were truncated to 250 base pairs, and reverse reads to 200 base pairs. Truncation was performed to maintain median quality scores above 30 across samples.
Reads were merged, and chimeras were subsequently filtered out.
We determined amplicon sequence variants (ASVs) and representative sequences against the SILVA 138.1 16 S rRNA gene reference database (Quast et al., 2012). We combined the count and taxonomy information for the generated ASVs into a classical ASV table, and further analyses were performed in QIIME v.1.8 (Caporaso et al., 2010;Kuczynski et al., 2012). Before analyses, we curated the table by removing unclassified reads at the bacterial or archaeal domain level, and any reads assigned as Eukaryotes. Finally, samples with less than 1000 reads per sample were removed from the table before analyses. We then summarized the filtered and curated ASV table to the family level (García-López et al., 2020, 2021, and all subsequent analyses were performed on this table.

F I G U R E 1
The distribution of the 10 NEON long-term ecological research freshwater stream sampling sites across the United States.
Briefly, we rarefied the family-level table to 1110 reads per sample and replicated 10 times across all samples. The rationale and justification for rarefying have been discussed in prior studies (Cameron et al., 2021;McKnight et al., 2019;Weiss et al., 2017). For alpha diversity, the chao1 (Huang & Zhang, 2013), Shannon's evenness (Shannon, 1957), and observed OTUs indices were calculated in QIIME, and significant differences among categorical groupings were determined via non-parametric Wilcoxon tests in JMP Pro 15 (S.A.S.). We generated the Bray-Curtis dissimilarity distance matrix (Bray & Curtis, 1957) using the 1110-rarefied table. The rationale for choosing the Bray-Curtis distance matrix has been previously outlined (Anderson et al., 2011). The calculated distance matrix was used to calculate the non-metric multidimensional scales (NMDS) in QIIME. The NMDS scales are used to visualize categorical sample groupings that differ in microbiome composition following a test of differences among these categorical groupings via permutational multivariate analysis of variance (PERMANOVA) (Anderson, 2017) in QIIME using the compare_categories.py command. We performed an indicator species analysis on the ASV table using the categories (insect order, FFG, location, etc.) to examine microbial community members most likely driving differences among microbiomes using the group significance command in QIIME (version 1.9) followed by a Kruskal-Wallis test. Figures were generated using JMP Pro 14 (SAS).

| Modeling of factors and microbiome composition
We compared the influence of freshwater insect taxonomy (orders, family, and genus), functional feeding groups, and geographical (i.e., NEON) sites on the beta diversity of associated gut microbiomes. To do this, we used intercept-only random effects models with the response variable (Bray-Curtis distance) as a measure of diversity and a Beta likelihood with a logit link. Each model contained a response (grand mean) and three varying intercepts (insect order [family and genus nested within this], functional feeding group, and NEON site), also known as random effects (McElreath, 2020). The structure of the model is as follows: Where y i is the Bray-Curtis value of the ith observation arising from a beta distribution with parameters alpha = • and beta = (1 − ) • .
In this parameterization, the mean is the primary target of inference and is represented by , while is a scalar. In addition to this model, we used model comparison (Hooten & Hobbs, 2015) to determine which predictor variable best explained beta diversity. We fit four models, each with a single fixed predictor of family, genus, functional feeding group, or stream site. The remaining predictors (i.e., those not specified as fixed effects) were included as varying intercepts. We then compared these models using the Watanabe-Akaike Information Criterion (WAIC) (Hooten & Hobbs, 2015). Priors for each parameter were chosen using prior predictive simulation (Wesner & Pomeranz, 2021) and are justified in the Appendix S1.
There are several advantages to this modeling approach (Dietze, 2017). First, the varying intercepts use partial pooling to pull each group's mean (i.e., mean of individual orders or FFGs or sites, etc.) toward the global mean. That provides conservative estimates of Bray-Curtis values for each group because the amount of pooling is determined by the amount of data in each group. In other words, it provides a correction for outliers such that groups with few data points are treated skeptically and pulled more strongly toward the overall mean (Efron & Morris, 1977). This is especially important for data like ours with relatively low replication within each group, helping to prevent spurious conclusions. Second, the residual variance of the grand mean of Bray-Curtis dissimilarity is partitioned among the varying intercepts. This allows us to identify the variables that contribute most of the variation in beta diversity, which are likely to be the best variables to focus on in future studies. Third, from these models, we can predict the diversity of the gut microbiomes at each level (e.g., for each species, stream, or functional feeding group). Fourth, the varying intercepts allow predictions, with proper uncertainty, of the microbiomes of insects that are not currently in our dataset (McElreath, 2020). Finally, varying intercepts automatically adjust for unbalanced data so that no single sample dominates the inference. We fit the model using Bayesian inference in R version 4.2.0 (R Core Team, 2022), with the brms package (Bürkner, 2017).
Posteriors were explored in rstan (Stan Development Team, 2022) with Hamiltonian Monte Carlo (No-U-Turn sampler). We used four chains with 2000 iterations each, and the first 1000 were discarded as a warmup. Model convergence was checked by ensuring that all R-hats were <1.1 and visually assessing the chains for mixing.

| Macroinvertebrate summary
We obtained 45 macroinvertebrate samples from 10 NEON sites across the continental USA, with each sample representing one to five individuals from a single genus or family. After removing four samples that did not meet sequence quality thresholds, our final dataset consisted of 41 samples (Table 1A)

| Freshwater macroinvertebrate gut microbiome diversity and composition
Overall, we obtained ~6.5 million reads from the sequencing effort. Filtering, merging, chimera removal, and curating of the resulting count and taxonomy table (removal of unassigned reads at the domain level and removal of samples with fewer than 1000 reads per sample) resulted in ~86% of reads retained (5,684,379 reads).
These were distributed across 41 samples yielding 12,658 ASVs (mean reads per sample = 19,468; minimum: 1235.000, maximum: 98,479.000). Rarefaction curves for the species diversity and richness indices at ~1110 reads per sample indicate that most microbial diversity had been sufficiently covered across samples ( Figure S1).
There were no significant differences among locations, taxonomy (order, family, genera), or FFGs for all four diversity indices evaluated ( Table 2). Further investigation of microbial community composition  (Figure 4b).
An unexpected but intriguing result from our microbiome analyses was the detection of the endosymbiont bacterium, TA B L E 1 A Summary of the macroinvertebrate samples obtained and used in this study.

| Assessment of variables shaping gut microbiomes
Among all samples, beta diversity ( (Table 4). For example, the standard error of delta WAIC overlapped for the top three models (genus, family, and FFG), as did the standard errors of R 2 (Table 4). In addition, no model had a mean of R 2 that was >0.5. This indicates that no single factor was dominant in explaining microbiome beta diversity in aquatic insects.

| DISCUSS ION
The nature and dynamics of insect-gut microbial associations are well established and understood for terrestrial insects (Dillon & Dillon, 2004;Douglas, 2015;Yun et al., 2014). For instance, there is a consensus on the impacts of diet, developmental stage, and environment on the gut microbiomes of terrestrial insects.
However, such consensus is lacking for aquatic insects and their associated gut microbiomes. For example, the gut microbiome of freshwater macroinvertebrates has been noted to differ from the surrounding environment under controlled laboratory studies (Ma et al., 2020) and field-collected samples (Ayayee et al., 2018;Pechal & Benbow, 2016;Receveur et al., 2020). All macroinvertebrate samples in this study were categorized into 10 geographic locations,  (Cummins, 2021;Gökçe, 2018). This approach allows for classifying hundreds of taxonomically different aquatic F I G U R E 2 Macroinvertebrate summary. The breakdown of freshwater macroinvertebrates was obtained from the 10 NEON sites, seven macroinvertebrate orders, 26 families, and five functional feeding group designations used in this study.
macroinvertebrates into relevant ecological units based on how they function and acquire food in aquatic ecosystems. Additionally, significant physiological differences exist among the various macroinvertebrate FFGs, further making them critical physiological units that can be relevant for structuring gut microbiomes of aquatic insects; a rationale proposed almost two decades ago (Harris, 1993).
Thus, the different gut conditions in the FFGs in this study may be driving resulting differences in associated gut microbiomes among freshwater macroinvertebrates. This is supported by the assertion that filter feeders (Trichoptera and Diptera) tend to have slightly acidic to alkaline gut pHs (Anderson & Cargill, 1987;Cummins, 1979;Martin et al., 1980;Martin, Martin, et al., 1981), grazers/collectors that feed on biofilm, such as Baetidae and Leptophlebiidae (order Ephemeroptera) tend to have neutral to slightly alkaline gut pHs (Austin & Baker, 1988), and predatory freshwater macroinvertebrates tend to have very alkaline gut pHs than other FFGs (Anderson & Cargill, 1987;Tierno de Figueroa et al., 2011).
The FFG classification is not without its drawbacks. As with any categorical classification, there can be substantial variation among individuals within a category. Most studies with aquatic insects can confidently identify taxa down to the order or family level. However, at these levels, there may be multiple FFGs within taxa, in addition to ontogenetic variation, and this can further complicate the assessment of gut microbiota. Overall, it stands to reason that FFGs with their inherent differences in gut physiology and digestive requirements (despite some overlap in food consumed) can serve as a good delineator of freshwater macroinvertebrate gut microbiomes.
While the possible mechanistic basis for microbiome variation among FFGs is well developed and supported by our data, as well as other studies (Ayayee et al., 2018;Receveur et al., 2020), there is also substantial variation among taxa and sites, but the mechanistic basis for this variation is less understood. Given the gradient of environmental conditions that our sites represent (e.g., rang-

F I G U R E 4
Relative abundance of differentially abundant bacterial ASVs at the family level that may be underscoring differences in microbiome community composition among freshwater macroinvertebrate orders (a) and functional feeding groups (FFGs) (b) used in this study.
The detection of Wolbachia in five of the freshwater macroinvertebrate orders in this study (Coleoptera, Odonata, Trichoptera, Plecoptera, and Ephemeroptera) supports previous work that concluded that Wolbachia could be considered a common endosymbiont of aquatic insects, with an incidence rate of 52% (Sazama et al., 2017). The detection of Wolbachia in Ephemeroptera, in this study, as in other studies (Sazama et al., 2017), was low compared to the other orders it was found in (low abundance from the microbiome data and no detection from the wsp-PCR study), Finally, the interplay between deterministic processes (nonrandom, species trait, niche-based mechanistic processes) and stochastic events (ecological processes, geographic location) in shaping microbiomes (host-associated and free-living) (Jizhong & Daliang, 2017) are increasingly being evaluated and considered in discussions of factors shaping the gut microbiome of insects.
These insights have generated various conceptual frameworks to assess dynamics governing community assembly (Vellend, 2010) and by extension, gut microbial community assembly in insect hosts (Brown et al., 2020). Of particular interest is the co-evolution between insect hosts (terrestrial or aquatic) and associated gut microbiomes and more importantly, the significance of the ecological designation (Functional feeding groups, FFGs) versus taxonomic designation of insect hosts on shaping gut microbiomes.
The results from our study perhaps lend support to deterministic mechanisms for gut microbiome assembly in freshwater macroinvertebrates, structured by ecological categorizations (functional feeding groups).
In conclusion, our study provides data in support of the existence of intrinsic processes that screen microbes from the surrounding source microbiota pool in streams before colonization and establishment in freshwater macroinvertebrate guts, predominantly underscored by the ecological classification of macroinvertebrates based on the mode of feeding, i.e., FFGs in these streams. As already mentioned, these processes may differ among FFGs due to the differences in gut physiologies, resulting in different gut microbiomes.
Next, we determined that the sampled streams also contributed to the observed variations in the gut microbiome composition of the sampled macroinvertebrates. Differences in site characteristics Note: Each model contains one fixed factor and three random effects. WAIC delta is the difference in WAIC scores between the best performing model and each subsequent model (along with the standard error of those differences). R 2 is the Bayesian version of standard R 2 , along with its standard error.
(geographic location of stream, stream condition, pH, salinity, etc.) may be impacting aquatic macroinvertebrate gut microbiomes, mainly via influencing the source microbiota pool in the water column (bacterioplankton) or the biofilm (on leaves, twigs, rocks, debris, etc.), in these streams. Finally, we determined that the taxonomy (order, family, and genus) of the sampled macroinvertebrates in this study accounted for the lowest variation in observed microbiome community composition. Overall, results suggest that future studies characterizing freshwater macroinvertebrate gut microbiomes would be best served by focusing on sampling representatives of multiple functional feeding groups within the same streams and from multiple streams, if possible.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data and code are available at https://github.com/jswes ner/neon_ micro biome and will be permanently archived with a DOI via Zenodo following acceptance.